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The 



race to sequence the Human . Genome is not a 100 yard dash, but a marathon, Although the Human Genome Project 
ahead of schedule, and a number of genes have been identified, we have just begun to get a glimpse of what specific mnlsdo 
how we might be able to better use this knowledge for therapeutic interventions. Teasing apart the interactions of genes and 
11 * '"f <"/oug/rour the/ten cycle, and correlating changes with health and disease will take even more Dm 

But wtth complete sequences, and the possibility of cross- species comparisons we can expect new insights and speeding up 2 

rZJ%^%^< "1 f °J! y *JPf" WhBt fU " Ctt0ns 8re with speclffZsequencoTsealeZ^g 

proteins (and then determining the structure of proteins - and the function of proteins) comes next Progress Is being made wHh 
proteins, but sequencing of carbohydrates is even more difficult. 



isv 
i 



Applications TechnalnaliHi Map: Finding guhte to terms hi theso glossaries SJ ttftMap 

Related glossaries Include Applications Functional g enomic* Protoomics 
Informatics Algorithms BIPllTformatics, In sillco & m olecular mr.rit.iing 
Technologies Chromatography & electrophnregig Mass sp ectrometry 

Biology PrPtelpg, P_mteJq.Str"Ptwre3, SNPs A genetic vartat Sequences - PNA & hey ^ 

alignmen t The process of lining up two or more sequences to achieve maximal levels of identity (and conservation in the case 
amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology. [NCBI Bioinformafc 

tl^^l 9,0bal j l S , !I!lt^. 10 ^ 1 * n * nm * nt * 0 P timai alignment, palrwise alignment Related terms BLAST, BEAU' 
BLASTS, FASTA, gapped BLAST. Needleman - Wunsch. Smith - Waterman alignment 

annotation: Bfoinformatlcs g lossary 

VZHOSHE : , Th ^- m US !u 10 deS ^' be 0,6 pr0CeSS ° f using a com P uter t0 up bits of sequence into a larger whole [Peer Dor 
Richard Copley "Riling in the gaps" Nature 409: 2 1 8-820, 1 5 Feb. 2001] 

Thb is different from assembly language, and the source of some confusion between biologists and computer scientists. 
Related term contfg assembly 
automation: See sequencers- automation 

.l^V™ Ennanc « d Alignment Utility: An enhanced version of the NCBFs BLAST database search tool BEAUTY wh< 

^^^^^^^^^^^^^^^^ 



BLAST (Basic Local Alignment Search Tool): Software program from NCBI for searchina nuhiie rint*K* M o k i_ 



'^ster but less rigorous than FASTA or Smith. Waterman 
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BLAST2: A newer release of BLAST that allows insertions or deletions In the aligned sequences. Gapped alignments may be mc 
biologically significant. Synonymous with gapped BLAST 



BLAT: BLAST Like Alignment Tool, Jim Kent httP:/Avww.qenomeblat.com/qenomeblatflndex.asD 

base caller A programs that analyzes trace data in chromatogram files and assigns a base for each peak. Some programs, for 
example, Phred and TraceTuner also produce a corresponding quality value for each base. [DNA Sequence Glossary Geospiza] 
httpV/www.geospiza.com/com^ 

cDNA sequencing: Within the DOE [US Dept. of Energy] Human Genome Program, support began in 1990 for high- throughput 
analyses ofc QNAs , the sturdy representatives of the cell's fragile messenger RNAs fmRNAs) for gene expression. In 1994 
I.MJVG.E was founded to coordinate worldwide, public- sector efforts in cDNA analyses. For a few years, most cDNA sequential 
used to catalogue cDNAs by short sequence reads called expressed sequence tag s (ESJs) In the fall of 1996, several researc 
teams worldwide announced plans for complete sequencing of cDNAs. The following spring,. a series of international cDNA .work; 
began. Building on the Integrated Molecular Analysis of Genome Expression ( I.M A.G.E ) consortium, the workshops' objedn 
are to extend the infrastructure I.M AG.E. has provided since 1 994 to the challenges of complete cONA sequencing. 
http:/Awww.oml.gov/meetinqs/w^rCc^ndexhtml ' 



chain termination sequencing method: See Sanger sequencing (under Maxartv Gilbert & Sanger), 
chemical cleavage sequencing; See Maxim- Gilbert sequencing, 
chemical degradation sequencing: See Maxim- Gilbert sequencing 

chromosome-specific shotgun sequencing: Chromosomes are separated by pulse- field gel electrophoresis and purified DNA 
used to constructed small- insert shotgun libraries. Clones from these libraries are subsequently sequenced. Plasmodium fafcipa 
Genome Project Sequencing Strategies, NCBI, US, 2001 http://www.ncbi.nlm.nrh.gov/PMGife/Genomes/DfalciDarum1.html 

clone, cloning: C elLMoiogy glossary Related term: library 

clone pickers: A number of institutions and biotech companies have designed robotic "clone pickers" to automate this picking an 
sorting process. The collection of clones is grown on large media plates, and optical scanners detect individual clones for the pic* 
arm. The robotic arm picks each clone with a needle, transfers the done to growth media arrayed in 96- or 384- well microliter 
plates, sterilizes the needle and continues the picking process. Commercial clone pickers* capable of picking and sorting over 10C 
clones per hour into arrayed microliter plates are available. [CHI High Throughout Genomics} Genomic Report, Dec. 2001. 

consensus sequence: A theoretical representative nucleotide or amino acid sequence in which each nucleotide or amino acid is 
one, which occurs most frequently at that site in the different forms which occur in nature. The phrase also refers to an actual 
sequence, which approximates the theoretical consensus. A known CONSERVED SEQUENCE set is represented by a consensu 
sequence. Commonly observed supensecondary protein structures (AMINO ACID MOTIFS) are often.formed by conserved 
sequences. {MeSH. 1991 J 

A sequence of DNA, RIMA, protein or carbohydrate derived from a number of similar molecules, which comprises the essential 
features for a particular function, [lUPAC Biofnorganic] 

Related term; Prot ein structures : amino acid motifs 

consented sequence; A sequence of amino acids in a polypeptide or of nucleotides in DNA or RNA that is similar across multipl* 
species. A known set of conserved sequences is represented by a CONSENSUS SEQUENCE. AMINO ACID MOTIFS are often 
composed of conserved sequences. [MeSH, 1993] 

y "highly conserved sequence" is a DNA sequence that is very similar in several different kinds of organisms Scientists regard th 
^ ross species similarities as evidence that a specific gene performs some basic function essential to many forms of life and that 
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evolution has therefore conserved its structure by permitting few mutations to accumulate in it [NHGRI] . 

)»ntig: Group of cloned (copied) pieces of DNA representing overlapping regions of a particular chromosome. [DOE] 

A contiguous (I.e. without gaps) stretch of DNA sequence which has bBen assembled solely on the basis of direct sequencing 
Information, i.e. sequencer reads. Note however that 'contrg' is used in other contexts in genomics to mean a contiguous assemb 
something (e.g. clones), without necessarily implying that all the bases in the assembly have been determined. Ensembl Glossar 
htft:/A/ega.sanger.ac.uk/Mus_m 

Narrower terms: initial sequence contigs, merged sequence contigs. Related terms clone, contlg assembly, scaffolds. 

Published genome sequence has many gaps and Interruptions. Concept of "conrjg" is crucial to our understanding of current 
limitations. [David Galas "Making sense of the sequence" Science 291 (5507); 1257, Feb. 16, 2001] 

contlg assembly; One of the most difficult and critical functions in DNA sequence analysis is putting together fragments from sei 
overlapping segments. Some programs do this better than others, particularly when dealing with sequences containing gaps. (Lai 
De Francesco "Some things considered" Scientist 12(201:18, Oct 12, 1999] http://www.the- 
scfentjstcom/yri998/oct/profile1 981 Q12.html 



NCBI Contlg Assembly and Annotation Process, National Center for Biotechnology Information, US, , Feb. 2001 
httD://www,ncbi.nlm.nih.qov/genome/pufde/build.html 



contlg mapping: Maps glossary 

coverage (or depth): The average number of times that a nucleotide cs represented by a high- quality base in a collection of ranc 
raw sequence. Operationally, a 'high- quality base' is defined as one with an accuracy of at least 99% (corresponding to a Phred 
score of at least 20)- [UC-Santa Cruz, US, Human Genome Project Working Draft Terminology, 2001] 
Attp^/genom e. uc^.edu/goldenPath/term.htjrnl 

DNA library: Combinatorial libraries & synthesis 

DNA reaction setup; Each individual DNA sequencing project defines the criteria tor reaction setup, which include: 1) the type of 
template; 2) the amount of DNA sequence that : needs to be determined from each template; and 3) how many templates are beim 
analyzed. Quantifying each template is time consuming, template quantities are usually estimated based on standard purification 
scale arid protocols to expedite the sequencing process. The sequencing method used is based on the original "Sanger (dideoxy 
chain termination) sequencing methods developed In the 1 970s. [CHI High Throughout Genomics) Genomic Report, Dec, 2001 

data handling: Before the advent of capillary-based automated sequencers, DNA sequence data was produced at a comparative 
inefficient rate, therefore date handling and analysis were not as great a concern. DNA sequencing data production rates of a sinj 
lab can now exceed one million bases per day, [CHI High Throughput Genomics] Genomic Report. Dec. 2001 

Related terms: Btolnfarrn aBcs g lossary 

de novo sequencing: Determination of sequences (of genes or amino acids) whose sequence is not yet known Can be done wr 
LC/MS/MS or nanoelectrospray MS/MS. [CHI Proteomics report] 

From the Latin Ve novo" from the beginning. See also Mas s spectrometry g lossary r 

depth: See under coverage 



detection methods: When DNA sequencing was first developed in the 1970s, detection methods were relatively primitive compa 
y today's technology. Instead of fluorescent tegs, DNA fragments were labeled with radioactive tags. DNA fragments were then 
esolved, based on size, on vertical polyacrytamide gels and the gels exposed to X-ray film to capture the DNA fingerprint rmage < 
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I the separated fragments. The X-ray film was viewed by a person and scored manually. Sequencing was limited to only 200- 300 
(bases of data from a single sample analysis and many reiterative steps were required to acquire just a few thousand bases of Dh 
^equence data. Development of fluorescent detection technology has allowed the data collection and analysis processes to be 
streamlined into a single step. DNA Is separated on an automated DNA sequencer and the fluorescent image of DNA migrating 
through the acrylamfde gel is captured by CCD cameras similar to that found in common home video reoorders. Software specifi 
the automated sequencer automatically tracks and interprets the image (I.e., determines the fluorescent identity and hence the 
specific nucleotide at the end of each fragment), processes the data, and "calls", or interprets, the order of bases in the DNA 
sequence. The data can then be used in downstream sequence assembly processes. [CHI High Throughput Genomics ] Genomi 
Report, Dec. 2001 

cfldeoxy sequencing: See Sanger sequencing under Maxam-Gilbert & Sanger* 
directed sequencing: See under shotgun sequencing 

draft genome sequence [human]: the sequence produced by combining the information from Individual sequenced dones (by 
creating merged sequenced contigs and them employing linking information to create scaffolds) and positioning the sequence ale 
the physical map of the chromosomes. (Nickname "golden path".) [Univ. of California Santa Cruz Human Genome Project Workir 
Draft terminology] http://oenome. ucsc.edu/QoJdenPath/tefm.html 

Sequence with lower accuracy than a finished sequence; some segments are missing or In the wrong order or orientation. [Hlstor 
the Human Genome Project" A Genome Glossary" Science 291 : pullout chart Feb. 16. 2001] • 

NHGRI Rapid Data Re/ease Policies, NHGRI. US, 2003 http://www.genome.qov/paae.cfm?DaQelD=1 0506537 
See also: working draft, human sequence 

Amamic programming methods; Assure the optimal global (Needleman and Wunsch 1 970; Sankoff and Kruskal 1983) or loc 
(Smith, et al. 1981 ) alignment by simply exploring aH possible alignments and choosing the best. ["Pedestrian gufde to analysing 
sequence databases" Burkhard Rost; Reinhard Schneider, 1999] http://cubic.bioc.columbia,edu/oapers/l999 jtedestrian/paoer.hl 

These methods, allow the introduction of artificial gaps in aligned sequences to create an optimal alignment. 

Related terms alignment, gap penalties, global alignment, local alignment, Needleman-Wunsch, sequencing algorithms 

flow cytometry : Cell biology glossary 

FASTA: The first widely used ajgorithm for database similarity searching. The program looks for optimal local alignments by scan 
the sequence for small matches called Vords". Initially, the scores of segments in which there are multiple word hits are calculate 
Cinitt"). Later the scores of several segments may be summed to generate an "initrf score. An optimized alignment that includes 
gaps is shown in the output as °opf . The sensitivity and speed of the search are Inversely related and controlled by the "k-tup" 
varlable which specifies the size of a "word*. (Pearson and Lfpman) [NCBI Bioinformatics] More rigorous and slower than BLAS" 
http://fast a.bioch.virqjnja.edu/ 



I 



finished sequence - human: Sequence In which bases are identified to an accuracy of no more than 1 error in 10,000 and are 
placed in the right order and orientation along a chromosome with almost no gaps. [History of the Human Genome Prelect" A Ger 
Glossary" Science 291: pullout chart Feb. 16. 2001] 

Each base pair has been sequenced 8-10 times, with the remaining gaps limited by present technology. ... No eukaryotic genorm 
sequenced so far has been totally sequenced - current technology isn't up to it. Highly, repetitive regions (not expected to contain 
many protein- coding genes) can be impossible (or very difficult) to clone. One definition of "finished" is that fewer than one ba 
10.000 is incorrectly assigned. [Peer Bork, Richard Copley "Filling in the gaps" Nature 409: 218^20, 15 Feb. 2001] 

hi some level it's a little arbitrary when you declare a sequence essentially complete." says NHGRI Director Francis Collins... The 



http:/Avww.geiioiricgto^ 



1/31/2005 



PAGE 50/63 • RCVD AT 30112005 5:27:25 PM [Eastern Standard Time] « SVR:USPT0ffXRF-1/1 * DMS:87293Q6 ' CSID:1 650 326 2422 1 DURATION (mnws);1M0 



SequMAR. 31. 2005« 2:38PM TTC-PA 650-326-2422 



NO. 5024 P. 5b age6ofl( 



definition of finished is evolving. Our definition today Is different from 10 years ago. Ten years ago we didn't even think at the lev< 
loenomes." says Laurie Goodman, editor of Genome Research. "I think the community at large should define done Not everyone 
£o!ng to agree, but when you're using the word you should define what It means." Francis Collins says "You're done when you'v* 
- exhausted the standard methods for closing the gaps. There should.be some biological reason why those last bits of seduence e 
you - not because you Just didn't bother." ["Are we there yet?" The Scientist :12 July 19, 1999] htto//www th *. 
s&l8ntlsiconVvr1999/imv/r>opkln o12 990719.html ' ' 

Related terms finished clone, Human Genome Project post-genomic. Genomics glo ssary 

finishing standards - Human Genome Project The International Human Genome Consortium recognizes the need to maxii 
Hie likelihood that the finished human genome sequence meets consistent standards of quality across all participating genome 
centers, and to adopt uniform practices and annotation for regions that present problems for current sequencing technology At tt 
Seventh International Meeting, the Consortium approved a detailed set of consensus standards for what should be considered a' 
finished sequence, a set of rules for dealing with regions that are difficult to resolve, and a set of finishing annotation tags: to be * 
submitted with accessions. Finishing Standards for the Human Genome Project - Version September 7, 2001 Standard Finishlnc 
Practices and Annotation of Problem Regions for the Human Genome Project (Genome Sequencing Center Washington Unlv if 
Louis School of Medicine, US) hlro7Awww.qenome.wustl.edu/Overviewffinnjtesname.Dhp7Gl^si 

flow sorting: Cell biology g l ossary 

full shotgun coverage: The coverage in random raw sequence needed from a large-insert clone to ensure that it is ready for 
finishing; this varies among centers but is typically 8-10 fold. Clones with full shotgun coverage can usually be assembled with on 
handful of gaps per 100 kb. [Univ. of California Santa Cruz Human Genome Project Working Draft terrninoloovl 
httD://Qenom e.ucsaedu/Q0ldenPath/t6rm.html ^ wj 



*T ^ nitfon ™. d ™"?I y ,ntemet L,nk 8Dftware htto://com P bio.oml.oov/Gr a il-l a/ A suite of tools designed to pre 
analysis and putative annotation of DNAsequences both interactively and through the use of automated computation rGrail 
bvervfew, Oak Ridge National Lab, US] httpV/compbio.omLoov/maniials/oralll Qfin* f h ^lgGrailOvervtew ' 1 

biotogyT namS ^ ' n SOm8 to Walter GilbeH ' S descrt P tton of 8,8 Human Genome Project as the 'Holy Grair of molecular 

gap: A space introduced into an alignment to compensate for insertions and deletions in one sequence relative to another To ■ 
ZSSEZ l^ Ulati °"° f 100 9 ? ps an an B nn ™t. Introduction of a gap causes the deduction of a fixed amount (the gai 
score) from the alignment score. Extension of the gap to encompass additional, nucleotides or amino acid is also penS in ihl 
scoring of an alignment. [NCBI Biolnformatics] penaiizqa in me 

Narrower term gap penalties 

gap penalties: An important problem is the treatment of gaps, i.e., residue inserted (or deleted) to optimise the objective function 

Usually, gap penalties (cost of inserting and extending gaps) are chosen to be length dependent TypfcaBy th f cost 

gapfeap elongation is 5-10 times lower than is the cost for Introducing a gap (gap openf-The optJmaSice of^ap penlTes ' 

SEE? If^o 8 ^^ ^ h detail ' ° n partiCUlar rPsdestrian guide to arising 9 sequence 85 

databases" Burkhard Rest Reinhard Schneider, 1999] htip:/fcubk: biDccolumr^ edu/nrn^ " 9 99 rattes dJ ^^S^ 

Related terms alignment, dynamic programming methods. Broader term gaps 

ElS*?' 1 I e 9enetlC COnslitut,on 01 an organism as revealed by genetic or molecular analysis, i.e. the complete set of oenes b 
dominant and recessive, possessed by a particular cell or organism. [IUPAC Biotech] ^mpieie set or genes, c 

The observed alleles at a genetic locus for an individual. [NHLBQ 

In organism's genetic makeup, as revealed through molecular analysis. [CHI SNPs Update report] 
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genotype to phenotype: Genomics glossary 

genotyping: The process of assessing genetic variation present *m an individual. CHA Cambridge Healthtech Advisors, Clinical 
Genomics: The Impact of Genomics on Clinical Trials and Medical Practice report. 2004 



Used for diagnosis, drug efficacy, and toxicity. Utilizes genomic DNA that, after digestion, reacts with a SNP array to obtain an 
Individual SNP pattern. These variations can for Instance provide information about the diagnosis of a certain disease, or the 
effectiveness or side' effect of a certain drug, Mlcroarr ays In Medic ine: Arrays of Possibilities April 26-27, 2004, Boston, 
Massachusetts 

The determination of relevant nucleotide- base sequences in each of the two parental chromosomes. [CHI SNPs Update report] 

May refer to identifying one or more, up to the entire gene sequence of an organism. Compare phenotype 

Genotyping implies {though I havent found this in print) determining known variants, as opposed to discovery of new ones. * 

Related terms Genetic variations glossary: Broader term sequencing; Narrower terms: haplotyping, viral genotyping 

genotyping technologies; The Ideal SNP genotyping system would offer assay uniformity, high throughput capacity, ease of asa 
design, and most importantly, accurate and reliable results. Various methodologies exist for detecting the alternative alleles of SN 
but each has its own advantages and disadvantages, and offers different levels of throughput capacity ... There are essentially thi 
different possibilities: 1) few SNPs, many samples, 2) many SNPs. few samples, and 3) many SNPs, many samples. White these 
categories may seem obvious, it is critical when optimizing a genotyping project to determine which of these categories is most 
Important, since different technologies work best for each group. ... high- throughput [genotyping methods] analysis have only 
recently been optimized. Even so, there is still no "gold standard" and different companies have devised their own technologies ft 
genotyping. [CHI High Throughput Genomics] Genomic Report, Dec 2001 . 

jbenotyping technologies have proliferated rapidly in recent years, and at least one hundred methods are currently available for 
detecting the genotypes of individual SNPs. in diploid organisms, such as humans, the linkage of particular SNP genotypes on es 
chromosome in a homologous pair (the haplotype) may provide additional information not available from SNP genotyping alone. 
[Lawrence Berkeley Lab "High Throughput Haplotytng of Diploid Organisms, 2001] http://www.lbLqov/Tech- 
TrBn sfer/collaboration/techs/fbnl174S.html 

"Variations on a gene: diverse technologies converge to detect human genetic variation: Amy L, Francis, Scientist 14 (15): 35, Jul 
2tK)0htt p://www.the-sd 00072 4 r fttml 

glass capillary electrophoresis: Chromatography & electrophoresis glossary A type of automated sequencer, 
global alignment: The alignment of two nucleic acid or protein sequences over their entire length, fNCBI Bioinformatics] 
Related term: dynamic programming methods. Broader term alignment 

Golden Path: The assembled genome sequence. Contigs are the principal building blocks of "Golden path". The term was origin; 
applied to the human genome assemblies coordinated at UCSC, but Some people nbw use it for any genome assembly. Ensembl 
Glossary http://veqa.sanger.ac.uk/Mu8 musculus/helpview?se=1&kw=gtossary 

haplotype: A particular pattern of sequential SNPs found on a single chromosome. These SNPs tend to be inherited together ov< 
time and can serve as disease-gene markers. The examination of single chromosome sets (haploid sets), as opposed to the usu? 
chromosome pairings (diploid sets), is important because mutations in one copy of a chromosome pair can be masked by normal 
sequences present on the other copy. (CHI Pmdtctive Phanrtacogenomics report] 

The linear, ordered arrangement of alleles on a chromosome. Haplotype analysis Is useful in identifying recombination events 
{NHLBI] 
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■Jin £22? Z° • S ^°!) ? f in i MdU8l t With respect t0 one member rf a * alr of alleBc 9 enes - sets of genes that are closely linke 
•and tend to be inherited together such as those of the MAJOR HISTOCOMPATIBILITY COMPLEX. [MeSH, 1987] 

The resulting genotype for a chromosomal locus constructed by combining the allele assignments from two or more oenetfcallx, 
Unted allele systems. Each IndMdual will have two haplotypes for the "megalocus'. representing the martyrs o^S ^ ^ 

!2 °^ U f , C ^ TO o m, ? SO,ne . ! Vo a l? d ,he ha P tolv Pes wl » segregate In a pedigree following Mendelian Inheritance. [C Helms 
Washington Unrv. St. Louis, US "Haplotyping* Nov. 19901 haD.flhdklabwustl.edu/lab m^nnai/m/f a a 1 ' 

Fforn -haploW genctypa.- The key Msa Is thai bMbs often wvel * pad*. TO, offer, the hope lhat pha^^^nn^... will not hcpetessly fragment phannaceutlea. 



fragrrwnte 



Related terms: haplotyping, haplotyping technologies Cell biology glossary diploid, haplold, ploldy; Mans: haplotype rr 
HapMap; Narrower term: SNPs & genetic variations 01^^*.^..^.^%^^ block, SNPhSoIype 

h£!S£^i S?^M ir l' M ? ,po 7 d £ 9 l rm ha ? e ^ 001,168 of ^ <*~n^some. A given single- base position may 
homozygous for the wild, type base (each chromosome has the normal allele), homozygous for a SNP base feach^romosom 
has the altered allele), or heterozygous for two different bases (one chromosome has the normal allele and the other hasthe 

abnormal allele). 

Haplotyphg Involves grouping subjects by haplotypes, or particular patterns of sequential SNPs, found on a single chromosom 

^J"??^™* ° be inheri,ed U>ge,hef 0Ver time and can as ***** Sena markers. The 'exaltation ^SS^SSi 
sets (haplold sets), as opposed to the usual chromosome pairings (diploid sets), Is important because mutations in one copy of: 
chromosome pair can be masked by normal sequences present on the other copy. tCHI SNPs Update report] 

Genes tend to travel in packs. This is good news for pha nmacogenomlcft . 

^reader terms genotyplng, sequencing 

I??£S '' T 0 " , to S,,, f ^ r effc v^ ' ,fl^ ' 0 ' , • interview with Mark Daley, Whitehead Institute, CHI's GenomeLii 

142 http://www.healthteGh com/riewsarticles/issue14 2 a sp 1 

haplotyping technologies: Include microarrays, mass spectrometry, sequenc ing 

Hidden Markov Models HMM: Molecular Modeling Useful for analyzing protein sequences. 

Wgh^hroughput s^" 81 ™!™* For hl 9 h throughput sequencing to work effectively, the DNA sequencing temolates must be mm 
However. template purification has historically been a bottleneck due to Its labor- in tensh/eness: huroSeds ^ ^5^0^ terSS 
SSSSSHT " Si ^' e T nd ? "t* consistency in purification methods is $E%!£^ 

pwfficaOon platforms have been developed that replicate the steps used in a manual template puriflcation protocol Sde a 
greater levelof consistenoy. Almost all of these purification platforms work flexibly with 96- and 384- well mSffi formats 
can accommodate a wide range of purification protocols and DMA template sources. ... A high iywiSilu^Svuo^ 

° r 3 c ? mmercial sile would have two and ten ABI 3700* or SS^£S£S!t!S£ 

^SiffS^l^i^ C0 " e ^ On ^ 0,0,165 <? r PCR P roducU ? genome screening projects and aSSSSS MS I- 
KSLT ate ° bS ° f ■"I"™** lai B e DNA «*■»• BACs and PACs. Finally. sfSSSSEw^S^ 

such as Celera Genomics or the Sanger Centre, would have between 10 and 300 sequencers; with the capaStyto teteS $ 
genome sequencng projects of various organisms,. [CHI High Throughout G^n,;^ Genomic Report, Dec.ToOl 

Compare low throughput sequencing, medium throughput sequencing. 

^H^!! 8y o N f TOT* ter T 8 8e 1 uen «» homology, sequence homology, nucleic acid: Functional aenomics «,to« a ,v 
homology Related terms homo.og (homologue), similarity, orthologf Jaralog, xenoligVSS^SSSfogy 



htq>://ww.genomicglosearies.com/«)ntent/sequeiicmg^oss.a5p 
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human sequence: See draft sequence, finished sequence, published sequence, worklnfl draft 

sense of the sequence" Science 291: 1257-12? 



1 Cdnt, ° S: D6riVGd fr ° m S6quenced *"» P avi * Galas "Making 



library; library, genomic: CeJI biology glossary 

local alignment: The alignment of some portion of two nucleic ecfd or protein sequences. [NCBI Bioinfbrmatics] 

Smith- Waterman alignment. Comr 



Best alignment method for sequences for whom no evolutionary relatedness is known See 
global alignment 



MALDI-TOF: Mass spectrometry glossary 

hte^/Www.oml Qov/h Q r T itR/|p l ih iicat/primp r /i n f r r f h»mi • ro ° 9e Na,,onal Lab - US) 

megalocus: See under hap lo type 

1 m icros. qu e„clng, s«,uenclng of p rate „ s or pepftle, o^ (SUD sometpnws for m „ 



i 



ingle base sequencing. R„ a ted terms: single base extension array; o^^,, ^ 



k«p://www.genomicglo«aries.c^^^ 
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multiple sequence alignment An alignment of three or more sequences with gaps inserted in the sequences such that residue 
I common structural positions and/ or ancestral residues are aligned in the same column. ClustalW is one of the most widely used 
/nUttlpie sequence alignment programs. {NCBI Bloinformatics] • • - ' 

The concept of dynamic programming cannot be extended to align more than three sequences optimally (Murata 1990). A way 
around this problem is to first find optimal pairwise alignments and to then merge the pairs .["Pedestrian guide to analysing seat 
databases" Burkhard Rost. Reinhard Schneider, 1999] http://cubjc.bioacotumb_ia.edu/papers/1999 pedesfrian/pappr html 

Related term Hidden Markov Models HMM 

multiplex sequencing; . TTie many applications of multiplex sequencing technology include the testing of any sample carrying 
nucleic acid material from a wide range of sources including: human clinical samples (e.g. screening for high risk subtypes of th 
human papilloma virus, screenfhg blood donations for Infectious organisms, identifying human genetic variations and SNPs, an 
selecting participants forcjJ nlca ) tf a l$ ), environmental samples, the food Industry, and agribusiness. Nucfefc-Add Based 
Technologies: Profiling PCR June 24-26, 2002 • Washington/ DC 



See also under Maxam- Gilbert sequencing. 

nanopore sequencing: A nanopore can probe a long polymer of DNA and directfy convert sequence information into electrical 
signals at very high speed. Because nanopore probing combines the power of single molecule approaches with the abilities of a ■ 
high throughput system, It can be applied to SNP genotyplng and multiplex haplotyping of genomic samples without the need to 
amplify selected molecules of Interest. Using recently perfected solid- state nanopores, future applications could Include rapid, 
complete, but inexpensive sequencing of a human genome. "Nanopore Sequencing: Implications in Personalized Medicine" Dr. C 
Branton, Harvard University Human Genome Discovery Molemilar Medicine Marketplace Mar. 17-19 r 2003, Santa Clara CA 

Biological membrane pores are also being investigated for rapid single DNA molecule analysis. These nanometer- sized pores a 
constructed from a -hemolysin channels, isolated from bacteria, placed in Teflon horizontal bilayers. A single DNA molecule is pu 
Jhrough a nanopore in only hundreds of microseconds. The major challenge for this technology is developing the capabilfty of res 
the genetic code of the DNA fragment In the brief time that It is traversing the nanopore. {CHI High Throughput Genomics] report 

Related term: Nanoscience & WJiplaturfzation glossary nanopore 

Needleman-Wunsch; Global sequence alignment algorithm. [Needleman, S. B. t WUnsch/C. D.. "A general method applicable tc 
search for similarities in toe amino acid sequence of two proteins" J. Mol. Bk>l.( 48): 443-453 Mar. 1970] Related terms dynamic 
programming; ftlflftrlthrns & data m anagement glossa ry. Molecular modeling glossary 

"online" automated DNA sequencers: These Instruments Integrate reaction thermocycling, sample purification, capillary 
electrophoresis, and signal detection into a single closed loop system. The DNA sample is simply injected at the beginning of the 
and all the protocols are automatically performed without additional human Intervention- fCHI High Through out Genomics} Genon 
Report, Dec 2001. . — w^i ^mi 

optimal alignment: An alignment of two sequences with the highest possible score. [NCBI Bioinformatfcs] 



perspective is the 'mathematical 1 optimal alignment This is the alignment that optimises a given objective function e g to find th* 
alignment with the highest number of pairwise identical residues. FASTA and BLAST are not guaranteed to find such a 
mathematjcally optimal alignment TPedestrian guide to analysing sequence databases" Burkhard Rost, Reinhard Schneider 19 
htto://cubic.bioc Co lumbia. edu/papers/1999jjede5trian/ paper html 



pathogen sequencing: In the future, more pathogens will have their genomes completely sequenced to determine not only how 
^ frathogen causes disease, but what if any, treatments will be most effective. The DNA sequences of viruses like HIV. human 

http://www«genormcglossm 1/31/2005 
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papUoma virus (HPV), and hepatitis C (HCV) are already being. characterized and therapies prescribed based on this aenetfc 
Information. To perform these types of diagnoses, DNA sequencing will have to become faster, more cost effective simpler to 
Aerform, and more accessible to clinical laboratories. [CHI Hiah Throughput Ganomin^ Genomic Report, Dec. 2001. 

Phrap: Assembler software, M to7Ayww.DhraD.eom/backQround. htm 

Phred: Base calling program for DNA sequence traces; ... developed by Drs. Phil Green and Brent Ewing, and is distributed und 
license from the University of Washington. http://www.phraD.orff/ u»uiuui«j una 

protein sequence: Proteins glossary . 
protein shotgun sequencing: 
Broader term: shotgun sequencing ' 

ESJm^™? IS^STf 9Bnom / e: International Human Genome Sequencing Consortium special issue: Nature 409 
(6822) 15 Feb 2001 http.//www,nature.rom/nqtore/|Oiimql^409/n6B22/ htto:/AVWWn a tu re .cor^o^ 

Human Genome [Celere Genomics sequence] special issue: Science 291 (5507) Feb 16 2001 
httD://wwwsciencema q.orq/contentfvol291/i8sue5507/inde)tshtml 



random sequencing: See under shotgun sequencing 

resequencing: Previously sequenced site is resequenced for SNP discovery or other purposes. [CHI SNPe report] 

:nc Lander director of the Whitehead Institute's Center for Genome Research, and professor of biology at MIT notes - The hums 
SEX'S ^JFT 0 " ™. but * wi " b « "^sequenced thousands of times, in oSerX exempt un™ve*C 
polygenic factors underlying human susceptibilities and predispositions ... Re-sequencing will also provide the ultimate tool for 
genotyptng studies* [E. Lander The New Genomics" Science 274: 536, 25 Oct. 1996] ■ 

Related terms: finished sequence, finishing standards, published working drafts, rough drafts - human genome, workinc 
SNP Single Nucleotide Polymorphism: SNPs & Genetic Variations 

SE ^^T^' ™°' Ves ^ ding new SNPs - - tD0ls are j" st beginning to emerge and many more robust technolooies are neede 
£HH Methods for Discovering and Scoring single Nucleotide Polymorphisms. Request for Applications Jan 9 1998? 
hBp7/Qrant s.nih.oov/omnt^ft l ilrtByrfa.fii Pg mf A -Hr^oft-on 1 htm | ^ wo «ppi«aiions Jan. g. 1998 J 

S^vfS" 9 ' Slf IVeS mB ^° ds . ^^lenvine the genotypes of many individuals for particular SNPs that have already been 
a^covered. tools are jus beginning to emerge and many more robust technologies are needed. [NIH Methods Sis£verir, 
SSSSIS " UCiB0Uae Potymorphlsms, Request for AppUcafJons Jan. 9, 1998 1htt D ://orant s nLo gj^ 



Sanger sequencing: See under Maxam-Gltbert sequencing. 

I e ! ff ™;?l] de ,? d sel of cont, B s P'aced on the chromosome. [NCBI, Human Genome Home "Contki Assembly Pma^s- r?fn R<! - 
Feb. 20011 http://www.nchl nlm nih gnw^^m^niH ^ouiid ^Mtate^ me Assembly Process Gloss« 

tTJ?? C0 " ti9 o 5! V he mt Drder but are not "ecessarlly connected in one continuous stretch of sequence [History of 
fuman Genome Project A Genome Glossary" Science 291: pulloul chart Feb. 16. 2001] sequence. [History of 
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The definition of a scaffold appears to be quite different in the Science and Nature draft published sequences. [David Galas 
"Making sense of sequence" Science 291 : 1257- Feb. 16, 2001] This Is also different from the scaffold defined In Drundlseov 
Bnd develop ment glossary . wwvwr 

The result of connecting contigs by linking Information, such as paired-end reads from plasmlds, paired-end reads from BACs lo 
mRNAs, or other sources. The contigs In a scaffold are ordered and oriented with respect to one another. [Univ of California Sat 
Cruz Human Genome Project Working Draft terminology] h to//genome.ucsc.erfu/aoldenPath/term.hfrnl 

Narrower terms: sequence- contig scaffold, sequenced- clone- contig scaffold Related term: contfg assembly, 
scanning, scoring: SNPs & other genetic variations glossa jy 

scoring methods: Many choices, best choice often problem dependent.. Nice review "Sequence Analysis: Which eoorina rhetho 
should I use? Pittsburgh Supercomputlng Center. Carnegie Mellon Univ. T999] 
htto:/Awiw ..psc.edu/researclVbfomed/hornoloaou5/scorina Primerhtm l 

Related terms filtering, gap, masking Molecular modeling glossary homology modeling Narrower term: SNP scoring 

^ . ' 

sequence alignment: The arrangement of two or more amino acid or base sequences from an organism or organisms in such 
way as to align areas of the sequences sharing common properties. The degree of reJatedness or homology between the seque 
is predicted computationally or stabstjcally based on weights assigned to the elements aligned between the sequences This in tu 
can serve as a potential indicator of the genetic relatedness between the organisms. [MeSH. 1991] Broader term?: alignments 

tESaZUZZ ana K tyS,S i J 6 "" 8 ]) 0 * "»«y*to * a «*ust field, and mining sequence data using blolriforniatics is one of the main acti 
of genomics- based drug discovery. Usmg sequence analysis to understand whole genomes may provide an important advan 
for groups looking for, new drug targets among genes, or trying to pick the best among targets they already have. Sequence ana 
teone of ttie most widely used techniques in gsnojniss.. A great deal of sequence work will continue'to be done as researchers f 
he gaps left in the genome maps of humans and other Important organisms. Studies to confirm sequence, and to identify SNPs i 
also need to continue. [CHI Blolnformatics report]. "uwiu.y ^r*. 

£27 homor °9 V: ™ e degree 0f sim,,ar1t y between sequences. Studies of amino acid and nucleotide sequences provide 
useful information about the genetic relatedness of certain spedes. [MeSH, 1 993] 

Broader term aiicitojiai^ojnici^sAa homology; Related terms Emotional gen omics g tos^ry evolutionary 
homology; toiteorni^io^a regulatory homology; Modular n^i^ ^^^^^'^^^0; sE£ 
genomics glossary structural homology 



JI^T^JS^'^ «" mi ?° acW 1116 deflrea * sim,teri, y between sequences of amino adds. This information is useful for the 
understanding of genetic relatedness of certain spedes. [MeSH, 1993] 

^SS^Sdd^S^^^^J^ i**"? nfia ' cf/^Pondence of nudeotkte triplets In a nudeic add molecule which perm 
nudeicacid hybridization. Sequence homology is important in the study of mechanisms of oncogenesis and also as'an indication 
me evolutionary relatedness of different organisms. The concept indudes viral homology. [MeSH. 1991] 

Broader term sequence homology 

£2lr£ n r?« ^° S: S ^ uence b " s 2 f . conti 9 residue9 ,n 'ens* Used to determine the mass of a particular sequence [CHI Proteor 

S5£?™ an<1 databaS6S h ' 9h SP6CifiCity - < B,ack8t0Ck & W ^ TWboSoTSJS , h 

sequence validation: 

f ieouence-contlg scaffold: Scaffold produced by connecting a maximal set of sequence confjgs joined by bridged gaps. [Univ. o 
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California Santa Cruz Human Genome Project Working Draft terniinologY] http://flenome.ucsc.edu/aoklenPath/tBrTTi html 

kequenced-clons-contlg scaffold: Scaffold produced by joining sequenced clone contigs by bridged SCO gaps. [Univ. of Califc 
^enta Crux Human Genome Project Working Draft terminology] http://aenome.ucsc. edu/qoldenPath/term.html 

sequencers- automation: The types and degree of automation needed at each [sequencing] step can vary greatly from laboratc 
laboratory. The ultimate goal of automation rs not to replace the role of humans, but to increase the speed and accuracy rates of 
individual steps while decreasing the number of repetitive manual steps that would otherwise be imposed on personnel. In fact, s 
of the steps may still be performed manually if that step can be performed faster or more proficiently by hand (e.g., preparation o 
reaction mixes, loading sequencing gels). [CHI High Through put G9nomk&\ Genomic Report, Dec. 2001 . 

Related term: online automated DNA sequencers 

sequencers- miniaturization: In the future, miniature DNA sequencers may become state of the art Pocket or brief case- sized 
sequencers would permit onsfte research to be performed In remote areas and dramatically Increase the speed at which DNA 
fragments are analyzed. fCHI Hhh Thmuahout Genomics\ Genomic Report. Dec. 2001. 

Related terms: fl fllgrpflrrays categories : lab- on- a- chip 

sequencing: (proteins, nucleic acids) Analytical procedures for the determination of the order of amino acids in a polypeptide d 
or of nucleotides in a DNA or RNA molecule. [IUPAC Compendium] 

For mutation detection uses well- characterized genes to search for mutations to screen for genetic abnormalities. This process < 
be accelerated with the aid of oligonucleotide arrays. With this tool an individual's risk of developing certain forms of cancer or 
mbnogenetlc diseases can be estimated. Mlcroarravs in Medicine: Arrays of Possibilities April 26-27, 2004, Boston, 
Massachusetts 

Sequencing of Womotecutes began with the fnsuSn B-eaaln - a thirty residue peptide - which Saenger and Tuppy deduced through a camfitnafion of fimfed proteolysis and 
chemical analysis tn 1051. (I was a ful] 14 year* tele*, until Hollay «t el. determined the sequence of alarrihe IRNA from yeasL And tt took another 12 yesm?, unta "tear DNA 
sequencing was developed by Maxam & Gilbert end Saenger et al In 1 677. [Introduction lo bldnformaUcs, Univ. of Munich Gene Cantor. Germany. Summer 2000] 
Mfr://www.tml?.uri^^ 01 l.html 

Largely automated now. Full DNA sequencing is the "gold standard - for genotyping. 

Narrower terms; resequenctng, sequencing • algorithms, sequencing, advanced, sequencing - cost of, sequencing - high 
throughput, sequencing - throughput, shotgun sequence, single DNA molecule sequencing, whole genome shotgun 
sequencing, chain termination sequencing, chemical cleavage sequencing, chemical degradation sequencing, de novo 
sequencing, dldeoxy sequencing, microsequenclng, minfsequenclng, multiplex sequencing, Sanger sequencing Related 
terms: genotyping, haplotyping 

sequencing, advanced: Jay Shendure et ai. f Advanced Sequencing Technologies: Methods and Goals, Nature Reviews Genet 
5:335- May 2004 htri>://areD.med.harvard.edu/PGP/Shendure04.pdf 

sequencing algorithms: See BLAST, FASTA, Needleman -Wunsch, Smith * Waterman 

sequencing - cost of: The cost of sequencing a single DNA base [when the Human Genome Project was initiated] was about $1 
then: today, sequencing costs have fallen about 100-fold to $.10 to $.20 a base and still are dropping rapidly. [Human Genome N< 
1 1 (1-2) Nov. 2000] htt p://www.oml.qov/hq nfris/publicat/hqn/v1 1 n 1 /01 aia nts.html 

sequencing - high- throughput; Uses robotics, automated DNA- sequencing machines and computers. 

sequencing - throughput: Production of genome sequence has skyrocketed over the past year, with more than 90 percent of thi 
Jequence having been produced in the past 15 months alone. Because of this increased capacity, the next phase is expected to i 
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much more rapidly than previously expected. [NHGRI, "International Human Genome Sequencing Consortium Publishes Sequel 
and Analysis of the Human Genome" Washington, D,C„ February 12, 2001] http://Www.nhnh-pih.gov/NEWS/initial sequenceFft 

shotgun sequencing method: Sequencing method which involves randomly sequencing tiny cloned pieces of the genome, wil 
foreknowledge of where on a chromosome the piece originally came from. This can be contrasted with "directed" [sequencing] 
strategies, in which pieces of DNA from adjacent stretches of a chromosome are sequenced. Directed strategies eliminate the ni 
for complex reassembly techniques. Because there are advantages to both strategies, researchers expect to use both random (< 
shotgun) and directed strategies in combination to sequence the human genome. [DOE] 

Uses dynamic programming methods. 

Narrower terms: chromosome-specific shotgun sequencing, protein shotgun sequencing, whole genome shotgun 
sequencing, YAC shotgun sequencing; Related term: full shotgun coverage 

Shotgun sequencing comes of age, Tabttha Powledge, Scientist Dec. 31 . 2002 httpy/www.biomedcen^ 
Hybrid of whole genome shotgun and clone- by- done approach' is probably best. 

similarity: Functional g enomics glossary 

Similarity search: BLAST, FASTA and Smith- Waterman are examples of similarity search algorithms. 

single base extension array: Single Base Extension [SBE] or mini- sequencing, was one of the first genotyping technologies 
developed, and it is categorized as a type of primer extension method. SBE is very similar to DNA sequencing, with the exceptor 
only one base (the polymorphic base or SNP) is queried, while sequencing can identify up to several hundred baSes and determi 
their relative order. The system is cheaper and can be performed in higher throughput than traditional sequencing. CHA Cambrid 
Heatthtech Advisors. Clinical Genomics: The Impact of Genomics on Clinical Trials a nd Medical Practice report, 2004 

telngle DNA molecule sequencing: The evolution of technology for single DNA molecule sequencing will ultimately permit who* 
' genome analysis of populations of cells at high resolution and will obviate current PCR- based approaches, particularly important 
sequencing diploid or polyploid cells. This is the ultimate in sensitivity, and perhaps difficulty. Further In the future, it might be 
possible to utilize the protein -synthesis machinery of the cell as a "sequencing engine." [National Center for Research Resource 
"Integrated Genomics Technologies Workshop Report" Jan 1999] 

Smith-Waterman alignment: An amino acid sequence alignment that illustrates sequence similarity. The alignment is generated 
using the Smith- Waterman algorithm (Temple Smith and MS Watermen, J Mql Bid. 147: 195-197; 1981; WR Pearson Genomic! 
11:635-650, 1991) [SGD Saccharomyces Genome Database glossary, Stanford Univ.] httpj^_nome r 
www.stanford.edu/Saccharomvces/help/glo ssa rv.htrn 



RplafriH fann K Hyrtftmln programming- Algorithms glossary. In SlHpo ft MpleCMiadtUQflfrlfng glossary 

templates: Used for sequencing generally come in two forms: 1 ) BGR products and 2) cloned DNA PCR (polymerase chain reac 
products are derived by the PCR process where a specific but minute portion of the genome is selectively amplified 1 bDllon- fold 
the source DNA (usually a complete genome). PCR products are generated when only a small but discrete portion of the genome 
needs to be analyzed in hundreds or thousands of Individuals. ... DNA clones are derived from cutting large portions of DNA 
(sometimes a complete genome) into discrete fragments that are "cloned" or inserted into DNA vectors. ... A single DNA fragrnenl 
Inserted into a vector and transferred into a host (generally E. coti bacteria) where the vector and the DNA fragment replicate as tl 
host replicates, thus producfng mass quantities of a single DNA fragment that can be purified from the host. [CHI Hioh Thmuahot 
Genomics} Genomic Report, Dec 2001. 

viral genotyping: Genomic data is enabling researchers to predict a patient's response to therapy based on the viral genotype fo 
viral Infections. HIV genotyping Is an early example of how treatment decisions are made based on the genotype of the virus. 

i/fraj homology: See under sequence homology- nucleic acid 
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whole genome shotgun sequencing: Celera's whole genome shotgun sequencing technique involves sequencing from both ei 
the double stranded cloned DNA. Celera's accurately paired clone end sequences are a key tool for assembling the genome mu« 
"more completely than single stranded sequencing methods allow at comparable levels of sequence coverage. Celera's paired er 
sequencing strategy, as part of the whole genome shotgun sequencing technique; has now produced sequence pairs from clone 
cover the human genome 11 times. The company believes that 99% of the human genome is represented in the cloned DNA. ["t 
Genomics completes sequencing phase of the genome from one human being" press release. Rockvllle, lAD t April 6, 2000] 
http://www.pecoro 

Broader term shotgun sequencing method. 

"working draft, human genome sequence": This site contains a working draft of the human genome, which is over 90% compl 
Approximately half of the sequence is in a highly accurate 'finished 1 state. The other half is merely 'draft' quality. Some care must 
taken interpreting draft regions, but these are still often very useful to the working scientist We encourage you to explore the wor 
draft with the genome browser, which displays the work of many annotators worldwide. [Human Genome Project Wbrking Draft, I 
of California. Santa Cruz, US] http://g6nome.ijcsc.edu/ 

This milestone was announced at the White House (Washington DC, US) on June 26, 2000. President Bill Clinton was joined by 
Francis Collins (National Human Genome Research Institute) and Craig Venter (Celera Genomics) and heads of the major US 
genome sequencing centers. Worlc continues to be done on annotating the sequence, but further celebration ensued with.publica 
of two versions of the sequence In Feb. 2001 . 

Related terms: draft sequence, finished sequence - human, published working drafts; Genomics gloss ary Human Genoi 
Project 

YAC shotgun sequencing; Yeast artificial chromosome clones are Isolated from a YAC fibrary using chromosome- specific prob 
(or markers). These YACs are then used to prepare small- insert shotgun libraries*. Clones from these libraries are then submit to 
sequencing. Plasmodium falciparum Genome Project Sequencing Strategies, NCBI, US. 2001 
ytp:/ftA^.ncbi.^ 

BUMfpgrgphy 

' CHI Pr&dictlve Pharmacoaenomics report 

DNA Sequencing glossary, Geospiza P Inc., 2002, 26 definitions. rffip://www.qeospga.oonVram 

Guide to Molecular Sequence Analysis Glossary, Andrew Louka, 2001, 35 definitions 
http:/A*nvw. bmnel.acxuk/depts/b^ guide/glossary, html 

NCBf (US) BLAST Glossary, 2000. 40+ definitions htto://ww,ncbi.nlm.ni^ 

Human Genome Project Information, Facts about Genome Sequencing, Oak Ridge National Lab, US, 2002 
httD://ww w.omLqov/hqmi^/ f aa/seQfacts.html 
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How to look for other unfamiliar terms 

IUPAC definitions are reprinted with the permission of the International Union of Pure and Applied Chemistry. 
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